10 research outputs found

    Detecting One-variable Patterns

    Full text link
    Given a pattern p=s1x1s2x2sr1xr1srp = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r such that x1,x2,,xr1{x,x}x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}, where xx is a variable and x\overset{{}_{\leftarrow}}{x} its reversal, and s1,s2,,srs_1,s_2,\ldots,s_r are strings that contain no variables, we describe an algorithm that constructs in O(rn)O(rn) time a compact representation of all PP instances of pp in an input string of length nn over a polynomially bounded integer alphabet, so that one can report those instances in O(P)O(P) time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201

    iQuantitator: A tool for protein expression inference using iTRAQ

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Isobaric Tags for Relative and Absolute Quantitation (iTRAQ™) [Applied Biosystems] have seen increased application in differential protein expression analysis. To facilitate the growing need to analyze iTRAQ data, especially for cases involving multiple iTRAQ experiments, we have developed a modeling approach, statistical methods, and tools for estimating the relative changes in protein expression under various treatments and experimental conditions.</p> <p>Results</p> <p>This modeling approach provides a unified analysis of data from multiple iTRAQ experiments and links the observed quantity (reporter ion peak area) to the experiment design and the calculated quantity of interest (treatment-dependent protein and peptide fold change) through an additive model under log transformation. Others have demonstrated, through a case study, this modeling approach and noted the computational challenges of parameter inference in the unbalanced data set typical of multiple iTRAQ experiments. Here we present the development of an inference approach, based on hierarchical regression with batching of regression coefficients and Markov Chain Monte Carlo (MCMC) methods that overcomes some of these challenges. In addition to our discussion of the underlying method, we also present our implementation of the software, simulation results, experimental results, and sample output from the resulting analysis report.</p> <p>Conclusion</p> <p>iQuantitator's process-based modeling approach overcomes limitations in current methods and allows for application in a variety of experimental designs. Additionally, hypertext-linked documents produced by the tool aid in the interpretation and exploration of results.</p

    Automation of a problem list using natural language processing

    Get PDF
    BACKGROUND: The medical problem list is an important part of the electronic medical record in development in our institution. To serve the functions it is designed for, the problem list has to be as accurate and timely as possible. However, the current problem list is usually incomplete and inaccurate, and is often totally unused. To alleviate this issue, we are building an environment where the problem list can be easily and effectively maintained. METHODS: For this project, 80 medical problems were selected for their frequency of use in our future clinical field of evaluation (cardiovascular). We have developed an Automated Problem List system composed of two main components: a background and a foreground application. The background application uses Natural Language Processing (NLP) to harvest potential problem list entries from the list of 80 targeted problems detected in the multiple free-text electronic documents available in our electronic medical record. These proposed medical problems drive the foreground application designed for management of the problem list. Within this application, the extracted problems are proposed to the physicians for addition to the official problem list. RESULTS: The set of 80 targeted medical problems selected for this project covered about 5% of all possible diagnoses coded in ICD-9-CM in our study population (cardiovascular adult inpatients), but about 64% of all instances of these coded diagnoses. The system contains algorithms to detect first document sections, then sentences within these sections, and finally potential problems within the sentences. The initial evaluation of the section and sentence detection algorithms demonstrated a sensitivity and positive predictive value of 100% when detecting sections, and a sensitivity of 89% and a positive predictive value of 94% when detecting sentences. CONCLUSION: The global aim of our project is to automate the process of creating and maintaining a problem list for hospitalized patients and thereby help to guarantee the timeliness, accuracy and completeness of this information

    The entrepreneurial subject as a political signifier: corpus analysis of forty years of Hansard

    No full text
    This article explores the political signification of the term entrepreneur in UK parliamentary debates over the past forty years. Following a review of the literature, a need is identified to understand the construction of the entrepreneur in political discourse. Concern here is not with the prosaic cataloguing of policies or definitions, but with exploring shifts in the discursive constructs of the entrepreneur that underlie political practice. To explore these constructions a large longitudinal dataset is systematically condensed, while maintaining sensitivity to the nuances of meaning. A corpus-based linguistics approach is undertaken. This combines the computational analysis of significant collocates, that is important words (concepts) that surround the term entrepreneur, with the richness of qualitative analysis. Patterns of reification, agency and structure are identified in the portrayed entrepreneurial constructs. The philosophical and practical implications of these patterns are discussed and proposals are made for using corpus techniques in international comparative analyses

    Developing linguistic theories using annotated corpora

    No full text
    This paper aims to carve out a place for corpus research within theoretical linguistics and psycholinguistics. We argue that annotated corpora naturally complement native speaker intuitions and controlled psycholinguistic methods and thus can be powerful tools for developing and evaluating linguistic theories. We also review basic methods and best practices for moving from corpus annotations to hypothesis formation and testing, offering practical advice and technical guidance to researchers wishing to incorporate corpus methods into their work
    corecore